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ABSTRACT 



This question and answer sheet describes the legislative 
background of current accountability requirements for 

English-as-a-Second-Language (ESL) programs, the issues involved in testing 
level gain, and critical questions whose answers can lead the field forward. 
It focuses on the following: what the legislation requires; how states are 
meeting these requirements; issues in testing level gain (NRS level 
descriptors, standardized testing, and performance assessment) ; what attempts 
at standardizing performance assessment are being undertaken (e.g., Ohio is 
developing a uniform portfolio system of performance assessment, and Colorado 
developed a certificate system based on performance assessments that was 
discarded in favor of a standardized test for NRS reporting) ; and critical 
issues that need to be addressed (e.g., what should be counted as success and 
how it should be measured; how well the NRS scale facilitates the reporting 
of learner progress; what the cost is in time, staffing, and funds to 
effectively assess and document learning outcomes; and what changes are 
needed in program design and staff development to ensure that assessment 
tools are reliably used). (Contains 10 references.) Adjunct ERIC 
Clearinghouse for ESL Literacy Education) (SM) 
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Issues in Accountability and Assessment for Adult ESL Instruction 
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T hroughout the 1990s, legislation increasingly 
required programs receiving federal funding to be 
more accountable for what they do. For adult 
education, these requirements have intensified the debate 
among practitioners, researchers, and policy makers as to 
what constitutes success and how to measure it. At the 
same time, the number of English language learners en- 
rolled in adult education programs has been growing, 
particularly in areas of the country that have not previously 
seen many immigrants (Pugsley, 2001). New programs are 
being established to meet the demand for English as a 
second language (ESL) instruction, and existing programs 
are expanding. 

This Q&A describes the legislative background of cur- 
rent accountability requirements for ESL programs, the 
issues involved in testing level gain, and critical questions 
whose answers can lead the field forward. 

What does legislation require ? 

The Adult Education and Family Literacy Act (Title II of 
the Workforce Investment Act [WIA] of 1998) requires each 
state to negotiate target levels of performance with the U.S. 
Department of Education (ED) for three core indicators: 

1. demonstrated improvements in skill levels in reading, 
writing, and speaking the English language, numeracy, 
problemsolving, English language acquisition, and other 
literacy skills; 

2. placement in, retention in, or completion of postsec- 
ondary education, training, unsubsidized employment, 
or career advancement; and 

3. receipt of a secondary school diploma or its recognized 
equivalent. 

ED established the National Reporting System for Adult 
Education (NRS) to define how states are required to report 
their data. NRS identifies 12 functioning level descriptors, 
6 for Adult Basic Education and 6 for English as a Second 
Language. The ESL level descriptors describe what a learner 
knows and can do in three areas: (a) speaking and listening, 
(b) reading and writing, and (c) functional and workplace 
skills (U.S. Department of Education, 1999-2001). These 
level descriptors define English language proficiency across 
six levels, from ESL Literacy to High Advanced. 



Title II of the WIA also lists 12 criteria for states to 
consider when funding adult education and literacy activi- 
ties. Among these criteria are establishing performance 
measures for learner outcomes, determining past effective- 
ness in meeting or exceeding these performance measures, 
and maintaining a high-quality information management 
system for reporting learner outcomes and monitoring 
program performance against the established measures. For 
measuring level gain, the NRS implementation document 
states that a standardized assessment procedure (e.g., a test 
or a performance assessment) is to be used. 

How are states meeting these requirements? 

To meet these criteria, each state has set its own perfor- 
mance standards in consultation with ED, indicating the 
percentage of learners that should progress from level to 
level in funded programs or across the state as a whole. A 
state can set different standards for different service provid- 
ers or for different levels of proficiency. For example, the 
percentage of learners expected to move from ESL Literacy 
to Beginning ESL could be lower than the percentage 
expected to move from Beginning ESL to Low Intermediate. 
This recognizes that a learner who enters a program with no 
literacy skills may require a great deal of instruction before 
showing level gain. Each state is evaluated by ED according 
to the state's own performance standards. A few states (e.g., 
California) have instituted performance-based contracts by 
which programs receive money only for the learners who 
make certain gains. 

States have also designated specific assessment tools or 
processes that programs may use to show level gain. These 
tools and processes vary among the states. Most states have 
chosen a standardized test (e.g., California: Comprehensive 
Adult Student Assessment System [CASAS], Texas: Basic 
English Skills Test [BEST], and New York: New York State 
Placement Test [NYSPlace]); several give choices among a 
list of approved tests (e.g., Arkansas: BEST or CASAS); and a 
few allow a standardized test for initial-level determination 
and then a competency checklist or uniform portfolio for 
exit-level determination (e.g., Florida and Ohio). For con- 
tact information for the BEST, CASAS, and NYSPlace, see 
"Adult ESL Tests" (p. 4). 
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What are the issues in testing level gain? 

NRS level descriptors 

Programs are required to report the percentage of learn- 
ers that move from level to level during the funding year. 
However, there is no research to support how long it takes 
to advance one NRS level. Because it takes several years to 
learn a language well (Thomas & Collier, 1997), such 
information is crucial in high-stakes assessment. The time 
it takes to show level gain on a proficiency scale is depen- 
dent on both program and learner factors. Program factors 
include intensity of the classes (how long and how many 
times per week); training and experience of the instructors; 
adequacy of facilities (e.g., comfortable, adequate lighting); 
and resources available to both instructors and learners. 
Learner factors include educational background, degree of 
literacy in native language, age, experience with trauma, 
and opportunities to use the language outside of instruc- 
tional time. Stakeholders need to know under what condi- 
tions (with which combinations of learner and program 
factors) NRS level gains are achievable. 

Standardized testing 

One way to test language development is through the 
use of standardized tests, which are developed according to 
explicit specifications. Test items are chosen for their ability 
to discriminate among levels, and administration proce- 
dures are consistent and uniform. Pencil-and-paper stan- 
dardized tests are often used because they are easy to 
administer to groups, require minimal training for the test 
administrator, and have documentation of reliability (con- 
sistency of results over time) and validity (measuring what 
the test says it measures) (Holt & Van Duzer, 2000). . 

Despite the advantages, standardized tests have limita- 
tions. Their results will have meaning to learners and 
teachers only if the test content is related to the goals and 
content of the instruction (Van Duzer & Berdan, 1999). 
Adult education programs are often tailored to take advan- 
tage of the few hours (typically 4-8 hours per week) that 
adult learners are available to study. Instruction may focus 
on a limited number of learner goals (e.g., finding a better 
job or helping children with their homework). If the items 
in a standardized test reflect the actual curriculum, then the 
test may accurately assess achievement of the learners. 
However, if the items do not reflect what is covered in the 
classroom, the test may not adequately assess what learners 
know and can do. Given the focus on real-life, practical 
content in adult ESL instruction, using a test that assesses 
everyday vocabulary and tasks (e.g., BEST or CASAS) can 
yield satisfactory results. 

There is concern, however, that standardized tests may 
not be able to capture the incremental changes in learning 



that occur over short periods of instructional time. Test- 
administration manuals usually recommend the minimum 
number of hours of instruction that should occur between 
pre- and post-testing, yet the learning that takes place 
within that time frame is dependent on the program and 
learner factors discussed previously. In the effort to make 
sure that learners are tested and counted before they leave, 
program staff may be post-testing before adequate instruc- 
tion has been given. In such cases, learners may not show 
enough progress to advance a level unless they pre-tested 
near the high end of the score ranges for a particular NRS 
level. 

Performance Assessment 

Performance assessments require learners to use prior 
knowledge and recent learning to accomplish tasks that 
demonstrate what they know and can do. There is a direct 
link between instruction and assessment. Examples of per- 
formance assessment tasks include oral or written reports 
(e.g., on how to become a citizen); projects (e.g., researching, 
producing, and distributing a booklet on recreational op- 
portunities available in the community); and exhibitions or 
demonstrations (e.g., a poster depicting the steps to becom- 
ing a U.S. citizen). A variety of performance assessments 
provide a more complete picture of a learner's abilities than 
can be gathered from performance on a pencil-and-paper 
standardized test. 

For adult ESL, performance assessment reflects current 
thought about second language acquisition: Learners ac- 
quire language as they use it in social interactions to 
accomplish purposeful tasks (e.g., finding information or 
applying for a job). The performance may be assessed 
simply by documenting the successful completion of the 
task or by the use of rubrics designed to assess various 
dimensions of carrying out the task (e.g., rating oral presen- 
tation skills on a scale of 1-5). Both instructors and learners 
can be involved in the development of evaluation guide- 
lines and in the evaluation procedure itself (Van Duzer & 
Berdan, 1999). 

Although performance assessments provide valuable in- 
formation to learners, instructors, and other program staff, 
their use for accountability purposes is currently limited. 
These types of assessment are time consuming to adminis- 
ter and score. To produce the reliable, quantifiable data 
required for high stakes assessment, performance assess- 
ments would need to be standardized. That is, for each of 
the NRS functioning levels, tasks would need to be devel- 
oped (and agreed upon) that would represent level comple- 
tion; scoring rubrics and guidelines for evaluating 
performance would need to be in place; and administrators 
and evaluators would need to be trained. 



What attempts at standardizing performance 
assessment are being undertaken? 

A few projects are attempting to develop performance 
assessments that would be acceptable for the NRS. 

♦ Ohio is developing a uniform portfolio system of perfor- 
mance assessment that is being validated by Ohio State 
University (Gillette, 2001). 

♦ Colorado developed a certificate system based on perfor- 
mance assessments that was discarded in favor of a 
standardized test for NRS reporting. However, the Colo- 
rado Department of Education is working with CASAS to 
standardize and validate one level of the Colorado Cer- 
tificate of Accomplishment so that it meets the rigors of 
high-stakes assessment (K. S. Weddel, personal commu- 
nication, December 10, 2001). 

♦ The National Institute for Literacy's (NIFL) Equipped for 
the Future (EFF) project staff is working with programs in 
several states to develop a continuum of performance for 
the EFF adult literacy content standards so that perfor- 
mance assessment tasks can be constructed (Stein, 2001). 

♦ ED's office of Vocational and Adult Education (OVAE) is 
supporting two performance assessment projects: (a) 
The Test of Emerging Literacy (TEL) is being developed 
by American Institutes for Research (AIR) with addi- 
tional support from Arizona, Massachusetts, and Wash- 
ington, and (b) the BEST Oral Interview is being revised 
by the Center for Applied Linguistics (CAL) in order to 
assess the full range of NRS functioning levels. (Two 
versions of the Interview, one print and one computer 
adaptive, will be available Fall 2002.) 

♦ OVAE and NIFL are supporting the National Academy of 
Sciences' review of standards for alternative performance 
assessment (National Academies Board on Testing and 
Assessment, 2001). 

For the time being, however, performance assessments 
remain difficult and costly to produce for high-stakes re- 
porting (Wrigley, 2001). 

What are the critical questions to be answered? 

The issues discussed in this Q&A point to several critical 
questions that need to be examined to move the field of 
adult education forward in solving the complexities of 
defining learner progress and how to measure it. 

1. What should be counted as success, and how should 
it be measured? What learners, instructors, and program 
staff count as success may differ from what is measured by 
state-mandated assessment procedures. Level gain is just 
one possible outcome of instruction. Equally important to 
learner success maybe an increase in literacy practices (e.g., 
reading a greater variety of print materials, reading to 
children); achievement of a personal goal (e.g., passing the 



citizenship test, receiving a job promotion); or an increase 
in confidence and self-esteem. States are currently able to 
count these outcomes in their own evaluation plans if they 
so choose. However, a change in the legislation would be 
required for the outcomes to be allowable under the provi- 
sions of the WIA. 

Stakeholders should work together to identify what 
combination of assessments (e.g., standardized, perfor- 
mance, logs of increased practices, goal attainment, obser- 
vations of increased confidence) will yield useful information 
for designing, modifying, and improving programs. If ac- 
countability continues to rest mainly on the results of 
standardized testing, then there is a need for additional 
language-based instruments that measure more than one 
skill (i.e., listening, speaking, reading, writing, grammar). 
Information about legislative requirements, learner goals 
and needs, and assessment specifications (e.g., what is 
purported to be measured, reliability study results) should 
be clear to each stakeholder. If a legislative change is 
warranted, then stakeholders should work with ED and 
legislators to have it enacted. 

2. How well does the NRS scale facilitate the reporting of 
learner progress? The NRS looks at functional level gain as 
one of three core indicators by which programs can mea- 
sure their success. However, no data are available that 
identify how long it takes to make a level gain and under 
what conditions (program type, intensity and length of 
instruction, resources and support services available). Adult 
education services are provided by a wide variety of institu- 
tions (e.g., local education agencies, community colleges, 
libraries, community-based and volunteer organizations, 
businesses, and unions) under varying conditions. The 
complex lives of the learners can leave them with little time 
for educational pursuits. The interrelationship among the 
time and conditions it takes to make a level gain, the 
assessment procedure chosen to measure that gain, and the 
resources available to assess it need to be examined. 

3. What is the cost in time, staffing, and funds to 
effectively assess and document learning outcomes? 
Adult education programs generally have limited operating 
funds. The implementation of standardized assessments, 
whether in a small program or a large one, requires extra 
staffing time, often beyond the limits of the funding re- 
ceived. In programs with large numbers of learners with low 
literacy skills, it is a tremendous challenge just to ensure 
that test forms are properly filled out (e.g., name and 
identification number) and answers are marked in appro- 
priate places. Additional costs maybe incurred as programs 
train staff or hire additional staff to develop, administer, or 
score assessments in a way that assures reliable and timely 
results. 
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4. What changes in program design and staff develop- 
ment are needed to ensure that assessment tools are 
reliably used? Even though standardized tests and some 
performance assessments have guidelines for administer- 
ing and scoring, test administrators may not be following 
them. As mentioned above, some programs and states are 
post-testing too soon after pre-testing because they are 
concerned that learners may leave the program before they 
are post-tested. However, learners may not show progress if 
they have not had adequate instruction time between test 
administrations. To ensure consistent and reliable assess- 
ment, administration procedures need to be carefully fol- 
lowed and adequate resources need to be allocated for training. 

5. How do local, state, and national policies affect 
assessment tools and practices and what policies need to 
be created? At the national level, the WI A and the NRS have 
set criteria that states must meet in order to receive federal 
funding. States have leeway, however, to set their own 
performance measures and select their own assessment 
procedures. Not all program staff may be aware of these 
policies. Their attitudes towards being required to use 
certain assessments may affect the results. What impact 
does such a policy have on programs? How does it differ 
from what is happening in other states where, to receive 
funding, programs are required only to achieve or exceed a 
certain percentage of learners making level gain? Are there 
differences in results among states requiring certain assess- 
ment tools versus those states that allow programs to choose? 

Conclusion 

The United States has made progress over the past decade 
in creating a cohesive system of adult education through 
legislation such as the Workforce Investment Act and 
frameworks such as the National Reporting System for 
Adult Education. Finding answers to the questions pre- 
sented here will contribute to the evolving system. At the 
same time, the political environment that presses for ac- 
countability creates tension with the enormous amount of 
time it takes to build such a system. As program staff in both 
new and established programs struggle with accountability 
issues, they need to advocate for sound assessment policies 
at the local, state, and national levels — and the resources to 
implement them. 
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